Render

PsychMods

Codes

  • Notes:
    1. AY2017/2018 Semester 2 and AY2018/2019 Semester 2 bidding data not available.
    2. The bidding statistics are highly non-normal due to being bounded by zero (they cannot make negative bids or have negative bidders). May consider using zero-inflated or poisson regression if considering these statistics as dependent variables.

Phase 1: Setting Up Environment, Packages And Loading Data

Load myBid.RDS

  • Downloading the data from the API using the code above takes a substantial amount of time.
  • I saved the downloaded data in myBid.RDS and load it directly from my local folder while I worked on the project.

Load Module Information

  • Module information was scattered across different folders.
  • Used a loop to repeat the process of downloading and converting to dataframe across the different folders accessed by the different URLs.
    • The same concept was used to consolidate information about the Module Titles.
myModInfo <- data.frame() # create empty dataframe which will act as a container to be populated with data
for(year in c(2011:2018)) # looping through each year
{
  for(semester in c(1,2))
  {
    # create the url where data is to be extracted from
    myurl <- paste0("https://api.nusmods.com/", year, "-", year + 1, "/", semester, "/moduleTimetableDeltaRaw.json")
    myjson <- fromJSON(file = url(myurl))
    for(r in 1:length(myjson)) # for each element in the myjson list, append it to myModInfo
    {
      if(isTRUE(str_detect(myjson[[r]]$ModuleCode, "^PL"))) # only keep info if module code begins with PL
      {
        if(myjson[[r]]$Semester == 1 | myjson[[r]]$Semester == 2) # only get semester 1 and 2 information
        {
          myModInfo <- rbind(myModInfo, myjson[[r]]) # add to dataframe
        }
      }
      myjson[[r]] <- NA # replace the element with NA to free up some rAM
    }
    cat(year, "Semester", semester, "Done!") # progress tracker
  }
}

myTitles <- data.frame() # create empty dataframe which will act as a container to be populated with data
for(year in c(2014:2018)) # looping through each year
{
    myurl <- paste0("https://api.nusmods.com/", year, "-", year + 1, "/moduleList.json") # create the url where data is to be extracted from
    myjson <- fromJSON(file = url(myurl))
    for(r in 1:length(myjson)) # for each element in the myjson list, append it to myModInfo
    {
      if(isTRUE(str_detect(myjson[[r]]$ModuleCode, "^PL"))) # only keep info if module code begins with PL
      {
        if(paste0(myjson[[r]]$Semester, collapse = "|") == "1"|
           paste0(myjson[[r]]$Semester, collapse = "|") == "2"|
           paste0(myjson[[r]]$Semester, collapse = "|") == "1|2") # only keep information from semester 1 and 2
        {
          myTitles <- rbind(myTitles, as.data.frame(myjson[[r]])) # add to dataframe
        }
      }
      myjson[[r]] <- NA # free RAM
    }
}

myModInfo <- myTitles %>% # add titles information to myModInfo
  select(ModuleCode, ModuleTitle) %>% # select these two columns
  filter(ModuleTitle != "Lab in Applied Psychology") %>%
  distinct() %>% # remove duplicates
  right_join(myModInfo, by = "ModuleCode") # left = myTitles, right = myModInfo

saveRDS(myModInfo, file = "myModInfo.RDS") # save to directory

Load myModInfo.RDS

  • Downloading the data from the API using the code above takes a substantial amount of time.
  • I saved the downloaded data in myModInfo.RDS and load the data directly while I worked on the project.

Phase 2: Filter, Transform And Merge

Module Information

  • Filter information from the dataframe myModInfo.
    • Removing non-Psychology modules.
    • Removing modules without module titles, these are modules that appeared before AY2014/2015 and never resurfaced afterwards.
    • Removing information about tutorials.
Filter

Bidding Information

  • Filter information from the dataframe myBid.
    • Removing non-Psychology modules, including Roots and Wings (prefixed with PLS-) and Psychology for non-Psychology students (prefixed with PLB-).
    • Removing information from quotas that are reserved and not available for bidding.
    • Removing information from modules with more than one lecture/seminar session.
    • Removing bidding information from non-psychology students.
  • Create new variable ClassNo by transforming from Group such that this information can be used to merge with myModInfo.
Filter & Transform

Merge

  • Combine the information of myModInfo and myBid.

Phase 3: Data Wrangling

  • The variables available in the original data are useful but they are too specific to interpret meaningfully.
  • This section creates new variables based on the original data and allow us to better discern any trend in the data.
  • Also includes additional wrangling and manipulations to ease the plotting of graphs and analysis later.

Phase 4: Data Diagnostics

  • Plot univariate histograms and bivariate plots using loops for almost every combination of variables.
  • The graphs from this section are predominantly for diagnostics rather than exploration, what I mean is that the graphs from this section would make little sense if one tried to draw insights from them. This is because they are aggregated across all other variables.
    • For example: The mean of Bidders is calculated across all academic years, all bidding rounds, all modules…
  • What I am looking out for in this section are odd patterns, like zeroes in places where they shouldn’t be, missing data, highly non-normal data, variables with outliers, etc…

Univariate Descriptive Statistics

## 'data.frame':    1934 obs. of  20 variables:
##  $ AcadYear           : Factor w/ 8 levels "2011/2012","2012/2013",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Semester           : Factor w/ 2 levels "1","2": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Round              : Factor w/ 7 levels "1A","1B","1C",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ModuleCode         : Factor w/ 87 levels "PL1101E","PL2131",..: 1 1 2 2 3 3 4 4 5 5 ...
##  $ Group              : Factor w/ 4 levels "LEC1","LEC2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Quota              : num  95 430 5 12 35 35 28 50 25 22 ...
##  $ Bidders            : num  10 100 3 42 8 3 7 2 8 5 ...
##  $ LowestBid          : num  1 1 1 205 1 1 1 1 1 1 ...
##  $ LowestSuccessfulBid: num  1 1 1 977 1 1 1 1 1 1 ...
##  $ HighestBid         : num  500 1150 368 1255 500 ...
##  $ StudentAcctType    : Factor w/ 4 levels "New[P]","NUS[P]",..: 3 1 3 1 3 1 3 1 3 1 ...
##  $ Group1             : chr  "LECTURE 1" "LECTURE 1" "LECTURE 1" "LECTURE 1" ...
##  $ ClassNo            : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
##  $ ModuleTitle        : Factor w/ 85 levels "Abnormal Psychology",..: 34 34 74 74 75 75 8 8 13 13 ...
##  $ DayText            : Factor w/ 5 levels "Monday","Tuesday",..: 1 1 3 3 2 2 2 2 3 3 ...
##  $ StartTime          : num  1800 1800 1600 1600 800 800 1200 1200 1400 1400 ...
##  $ Level              : Factor w/ 4 levels "Level 1","Level 2",..: 1 1 2 2 2 2 3 3 3 3 ...
##  $ BidPerQuota        : num  0.105 0.233 0.6 3.5 0.229 ...
##  $ Period             : Factor w/ 2 levels "Morning",">=Afternoon": 2 2 2 2 1 1 2 2 2 2 ...
##  $ Category           : Factor w/ 4 levels "Core","Elective",..: 1 1 1 1 1 1 1 1 1 1 ...
## Warning in describe(mydata): NAs introduced by coercion
## Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning -Inf

Bivariate Plots

  • Plots to illustrate pairwise relationships amongst variables.

Phase 5: Supplementary Graphs

Do less people bid for a module if the lecture begins in the morning (before 12pm)?

Lets look at each module and compare the average number of bidders, bidders per quota and lowest successful bids when the lecture begins in and after the morning.

Bonus: Multilevel Modeling
Peek Data

Do results from previous rounds…

Post

Module Biddiing

My favourite part of university education was the ability to pick and choose modules. Excluding the compulsory modules, we were frequently spoiled for choice when it came to the electives. But this freedom came at a cost, we had to bid for the modules instead of simply being assigned them. The bidding system (CORS) was created to cope with the reality that certain modules were in higher demand, yet the modules had limited capacity. Students had to carefully ration their limited bid points, which were used to win auctions for desired modules. Do you go all-in on an extremely popular module and be stuck with no points to bid for the remaining modules? Or spread the risk and bid moderately on multiple modules that align with your interest?


Inadvertently, we began to evaluate the demand or popularity of each module to guide our bidding choices. We might even advise juniors or peers based on such evaluations. For example:


Psychological Therapies is the most popular module, so you need to plan ahead and stockpile points from previous semesters if you plan on bidding for them.


The above originated from observing peers grief over their inability to secure a place in Psychological Therapies due to the exhorbitant amount of points required (which required students to stockpile points from previous semesters). But I have never heard anyone claiming that they really wanted to study Cognitive Neuroscience but failed to bid for it.
But was it true that Psychological Therapies was the most popular module? Rather than inferring popularity from personal anecdotes and observation, do we have data to support this claim? The answer is yes! Past bidding statistics and other module information are available at https://nusmods.com/api/. All thanks to the team at NUSMods who created a great timetabling tool for all NUS students. With these data, we can pitch the question broader and ask,

What were the most popular modules?

The information was downloaded, extracted, transformed, analysed and visualized using R. The codes are available under Codes tab above. The API contains extracted data for all modules from different majors and faculty but I will focus only on Psychology modules in this post as I have greater familiarity with them.


Module Categories

For the typical Psychology major, there are broadly four categories of modules.

Categories Description
Core Modules Modules that are required for all undergraduates. Includes PL1101E, PL2131, PL2132, PL3232 to PL3236.
Level 3 Elective Modules Modules that are outside of the core modules. Between four to six of these are required by all undergraduates to graduate. Their module codes run from PL3237 to PL3260.
Level 3 Lab Modules Lab modules are structured as individual or group research projects in a specific domain of Psychology. Every undergraduate is required to complete at least one of these modules. Their module codes are prefixed with PL328x.
Level 4 Honor Modules Modules that are required to graduate on the Honors track, usually taken near the end of the undergraduate degree. Between five to eight of these are required to graduate. They are prefixed with PL4xxx.

Core modules were usually simple to get and would most likely be allocated to you in the Module Preference Exercise (more on that later…). Within the other three categories, what were the most popular modules?


Popularity

To proceed, we would need some consensus on what popularity is and how to it. Luckily, the data contained bidding statistics that could be indicators of popularity. These are the key bidding statistics/variables:

  1. Quota
    • The maximum number of students allowed in the module.
  2. Bidders
    • The number of students who placed a bid on the module.
  3. Bidders Per Quota (BpQ)
    • The number of bidders for each available quota, \(BpQ = \frac{Bidders}{Quota}.\)
    • A value above 1 indicates that the module had more bidders than quota and a value below 1 indicates the opposite.
  4. Lowest Successful Bid (LSB)
    • The lowest bid that is allocated the module, students who bidded below this value will not be allocated the module.
      The bar graphs below illustrates the mean Quota, Bidders, BpQ and LSB of each module category, calculated across all modules, semesters and rounds. The different categories vary greatly in these statistics and their importance to the undergraduate program, which makes it difficult to meaningfully compare popularity across categories.

We define a popular modulea as possessing the following characteristic in Round 1A (the first round of bidding):

  1. Maximum Quota available.
    • Some background on the bidding system: Round 1A is officially the first round but there is a Module Preference Exercise before Round 1A. In this exercise, all students declare the modules that they wish to study for the coming semester.
    • When the total number of students that wish to study a particular module is less than the quota (demand < supply), these students will be allocated the module for free. The unfilled quota will be up for bidding in Round 1A.
    • If the number of interested students exceed the quota (demand > supply), no students will be allocated the module and all quotas will be up for bidding. Popular modules are expected to fall into this scenario, thus their quota in Round 1A should be at a maximum.
  2. Number of Bidders exceed the Quota.
  3. High LSB.
  4. High BpQ.

Modules that do not fit criteria 1. and 2. will not be considered popular. Amongst these modules, 3. and 4. will be used to determine which modules were most popular.

  • The bar graphs below displays the mean LSB of Honor, Lab and Elective modules in Round 1A, averaged across all academic years, semesters, lecture slots (for modules with multiple lecture slots) and account types.
  • Only modules with a median Quota of 40 and above (1.) and median BpQ more than 1 (2.) in Round 1A are displayed.
  • Hover over the respective bars to view other statistics such as the mean/median number of Bidders, Quota, BpQ and LSB.

Honor Modules


Lab Modules


Elective Modules

The elective modules were not filtered by median Quota or BpQ because the Quota for elective varied greatly across different modules unlike the Lab and Honors modules. Here